1 Pre-Tutorial Homework

1.1 To-do:

1.1.1 Learn how to create a GitHub account

First, we need to create a GitHub account. To create a github account, visit Github. Register with your BSU email address and create an account for free. Good news if you already have a GitHub account, just login with your details!

1.1.2 Download and Install Git

Download and install Git on your local computer using this link. Choose your operating system and follow the directions given on the website.

2 Introduction

Throughout this semester, we have discussed making our science reproducible and accessible. We started by learning the concept of reproducible science and why it is so important. We also learned that there is a reproducibility crisis, what might be causing it, and how to address it. To address writing reproducible code, we learned about R markdown files and how to format them in a clear, reproducible way. We next learned how to make bibliographies, appendices, tables, figures, citations, and references in R markdown and how to use settings to best display them. We learned about creating and using functions and lists in R and writing code in a clear, reproducible, and defensive way.

Then, we discussed the scientific method and how to mitigate the threats to each step of the scientific process. One of the mitigation strategies is using the TOP (Transparency and Openness Promotion) Guidelines, another is the OSF (Open Science Framework). Other mitigation strategies include the FAIR and CARE principles. FAIR principles focus on making data accessible to everyone, whereas CARE principles apply specifically to data involving Indigenous People and emphasize considerations of who benefits from the data, who can collect the data and the responsibilities and ethics surrounding the data. They are technically in opposition to each other but are equally important to consider when conducting research and before sharing data. Connected to this was the concept of a Creative Common license and which would be best to use for your own research.

Finally, we learned about data management, that it is best to create a data management plan before starting research, and how to manage your data in a way that means it will continue to usable by you, your colleagues, and the general public, as long as such accessibility has been checked by the CARE principles.

One of the final pieces of making your science reproducible is making sure your data and code are accessible far into the future, and importantly considering the FAIR and CARE principles. During this tutorial, we will introduce Git and Github, which will help make your science more reproducible.

2.1 Git and GitHub

Git is a version control system, which keeps track of the changes made to the files stored within it and allows us to return to previous versions. GitHub is the cloud hosting service built on top of Git. It can store data for you remotely, solving the issue of data storage. However, it was not built specifically for researchers, but instead for computer programmers. So at first glance, GitHub can be challenging to understand and use. With some time, the many helpful features of GitHub can become more clear and be used to help us conduct more reproducible research.

One of these features is data storage and access. When you create a project, which GitHub calls a repository, you can store data and code along with a README.md file that can be used to make your file structure even more clear. It can be particularly useful when several collaborators are working on the same project. Github allows files to be shared and edited, and tracks who edited which files and how they edit the files. Researchers unconnected to the project can also suggest changes to files, if the repository is public, which could provide further checks that help make science more reproducible. Github will store versions of your file indefinitely and for free, and will connect to RStudio Projects, which we will discuss today. We created this tutorial with information from (Gandrud, 2020).

On Day 1, we will learn Key Terms for work in Git and GitHub, make a new repository in GitHub and clone it into a new RStudio Git version control project, and work with partners to practice working with GitHub.

On Day 2, we will look at GitHub repositories that are alive and nice examples of how your GitHub repository might look in the future! We will practice importing data from one of these into RStudio, and uploading data into your own GitHub repository.

3 Day 1

3.1 Learning Objectives

3.1.1 Set your R Terminal to Git Bash

Make sure your settings are up to date by going to Tools –> Global Options –> Terminal –> under the General Tab –> New Terminals Open with: GitBash –> Apply.

3.1.2 Key Terms

As we work today and for your future Git adventures, here are a few useful terms and their definitions. You can use the search bar embedded in the table to search terms or any words in the definitions for further clarification and assistance. Pay specific attention to the definitions for Commit, Commit message, Pull, Push, and Repository.

3.1.3 Creating a remote repository in GitHub

Now that you have created a GitHub account, it is time to make your first repository! To do so,

  1. Click the green and white image in the upper right hand corner of GitHub and look for the Repositories button.

  1. Click the green “New” button in the upper right hand corner of the Repositories tab.

  1. Fill out the new Repository, your username should be in the owner section. Give the repository a name that is easy for you to remember, cannot match the name of another repository in your account. GitHub will recommend fun and interesting names, which would be unique but may be hard to remember and associate with your work, so be careful what you choose.

For the Configuration options:

  1. Create a public repository.
  2. Turn on Add README so that you’ll have a readme file in your repository.
  3. Turn on Add .gitignore file and choose R as file template.
  4. Click Create Repository.

3.1.4 Creating a local repository in RStudio

In RStudio, just like creating a new project at the beginning of this class (or anytime you want to start a new project), you can make a new project that can be linked to a remote repository in GitHub.

  1. Go back to your GitHub, find the repository you just made and go to the green Code button in the upper right hand corner. Click this and a tab will appear that says Clone. Under this tab, there is a HTTPS section. Make sure you are in this section, then press the icon of one square on top of another inside of the red circle below in Figure 3.1.
Cloning Repository from GitHub into RStudio

Figure 3.1: Cloning Repository from GitHub into RStudio

  1. Create a new project on RStudio with Git version control by first choosing “Version Control” in the new project wizard.

  1. Next, select the Git button to clone a project from a Git repository.

  1. Paste this url that you just copied into the box that says Repository URL inside the Clone Git Repository box. For the Project directory name, use the same name as your GitHub repository. Create the new RStudio Git version control inside a subdirectory, using your Desktop, Documents folder, or anywhere you normally create new RStudio directories.

3.1.5 Turn on notifications for push changes

If you are the owner of the repository, when you and your collaborators make changes, you can see the changes if you have push notifications turned on. This is good for tracking and knowing what your collaborators are working on. To turn on push notifications:

  1. Click on Settings on the upper right while in the repository you just created (or any you need notifications for).

  1. Scroll down to the bottom of the list of settings and in the lower left corner, click on Email notifications. You should see the Email notifications page.

  1. Make sure to add the email address you would like to receive notifications on, check the box next to Active if not already checked, then press Update settings.

Unfortunately only the admin or owner of the repository can get notifications when changes are made. So far we have not found a way to make collaborators receive these notifications. We tried using the “Watch” button in the upper right for the collaborator to receive this messages, but that did not seem to work. This remains to be explored as it can be a useful tool for remote communication.

3.1.6 Activity for Day 1 of the Tutorial

By now, you should have created a repository of your own and linked it with a matching project in RStudio. The next piece of working in github is to practice working with a collaborator.

In pairs, you will:

  1. Choose one of the pair to be the admin and one to be the collaborator.

  2. The admin will go to the settings of their repository, then click on Collaborators in the upper left area of the settings. You’ll be prompted to login, once you do, you’ll get to the collaborators screen.

  1. Add your partner by clicking the add people button and searching for your partner’s GitHub account name (check with them, sometimes it may be different from their name, include numbers, etc.)

  1. Click Add to repository, your partner should get a code in an email to access your repository.

  1. Once the collaborator partner finds the admin’s repository, click on the green code button in the upper right hand corner.

  1. Copy the HTTPS URL with the box button to the right of the url. Follow the directions to clone your repository in RStudio as presented in Creating a local repository in RStudio.

  1. Once each of you have the repository open in RStudio, one will go first, writing a sentence in the README.md, saving the your local file, committing the change (Figure 3.2), pushing the change to the remote GitHub repository file, then the other partner pulls that change with the down pull arrow and adds their own sentence.

  2. Do a repeat of this exercise until you have 3 sentences each (or 1 each for time).

  • Have conversation with each other (Fiona and Tshia will push and pull and demonstrate on the projector).

3.1.7 Committing your changes cautiously

  • Do not commit at the same time as your collaborator, Git does not like it. It will show an error (Figure 3.2).
  • Make sure the box for the doc/file you are working on is checked before (blue tick) before you push. This stages the file and enables changes to be committed.
  • You will need to write something in the commit box for GitHub to accept the changes and it is helpful to summarize the changes you have made for your future reference and your collaborators. Remember, some of the power of GitHub is that changes can be tracked, but this feature is only helpful when you can easily pick out the version you are looking for from a long list of versions.
  • To avoid accidental merging and overwriting your changes or the changes of your collaborators, communicate with your them when committing, pushing, and pulling changes. After creating this tutorial, we also found it helpful to have a backup version (not associated with GitHub) saved as well.
Interface when pushing and committing changes

Figure 3.2: Interface when pushing and committing changes

4 Day 2

4.1 Learning Objectives

4.1.1 Activity for Day 2 of the Tutorial

In Day 1, we learned Key Terms for work in Git and GitHub, made a new repository in GitHub and cloned it into a new RStudio Git version control project, and worked with partners to practice making changes, committing them, pushing changes to the remote GitHub repository, and pulling changes from the remote GitHub repository to your local RStudio project files.

On Day 2, we will look at GitHub repositories that are alive and nice examples of how your GitHub repository might look in the future! We will practice importing data from one of these into RStudio, and uploading data into your own GitHub repository.

4.1.2 Checking out public repositories

As you read more papers, interact with collaborators, and become a scientist doing extremely reproducible science, you will likely need to bring data from a public GitHub repository into your own computer and into R.

  • One example of a public repository is Sven’s repository for this class Reproducible 603. Following the link, you can see his repository, with all the folders and files inside displayed in a list. If you scroll to the bottom, you can see the README with information about this repository/class.

  • Another example of a public repository is one that was made as part of a paper submission. Fiona read a cool microbial ecology paper that tracked the ecological and evolutionary responses of Curtobacterium over an elevation gradient in Southern California (Chase et al., 2021). In the paper, there is a section that says Data Availability. Following the link to their github, we can see a similar set of folders and files, as well as a README.

4.1.3 Import data into R from a GitHub repository

  • For either repository, find a data file. Usually, you will have the specific data in mind and may have background for the name you are looking for. However, sometimes repositories can be unclear. Use what we have learned in class to identify data, whether a .csv file, a .txt file, or a .xlsx file.

4.1.4 Downloading data, uploading data, importing into RStudio project

  1. Download the data file by clicking on the Download raw file button. It will now be on your computer.

  2. Practice uploading data to your own GitHub repository by returning to your repository, clicking the add file button next to the green code button, then Upload files in the dropdown.

  3. You can drag files into the box or choose your files. Make sure to commit your changes in the section underneath the box.

  4. To import data into RStudio, go to your repository in GitHub.

  5. Find the data file we just uploaded and go to the file by clicking on it in the list of files in your repository.

  6. Click on the Raw button in the upper left.

  7. It should bring you to a page with just the file contents and the url at the top is what you need to copy.

  8. Moving back to RStudio, in a new R script, run these lines of code:

# Paste URL into url object
#url <-"PASTE HERE"
# Download data
#YOUR_data <- rio::import(url, format = "csv")

You should now see your data file in the environment, ready to be used!

5 Relevant resources

As a final send-off to all of you (and any future users of this tutorial), here is a list of resources we used to build this tutorial.

For the latest version of Gandrud’s textbook, visit this link which will allow you to search Boise State University’s library and download an ebook: 3rd edition reproducuble Research with R and Rstudio

Github for dummies video

Github control version video

6 References

Chase, A.B., C. Weihe, and J.B.H. Martiny. 2021. Adaptive differentiation and rapid evolution of a soil bacterium along a climate gradient. Proceedings of the National Academy of Sciences 118: e2101254118. Available at: https://www.pnas.org/doi/abs/10.1073/pnas.2101254118.
Gandrud, C. 2020. Reproducible Research with R and RStudio. C. Gandrud [ed.],. CRC Press.

Appendices

A Appendix 1

Citations of all R packages used to generate this report. [1] J. Allaire, Y. Xie, C. Dervieux, et al. rmarkdown: Dynamic Documents for R. R package version 2.29. 2024. https://github.com/rstudio/rmarkdown.

[2] S. M. Bache and H. Wickham. magrittr: A Forward-Pipe Operator for R. R package version 2.0.3. 2022. https://magrittr.tidyverse.org.

[3] J. Becker, C. Chan, D. Schoch, et al. rio: A Swiss-Army Knife for Data I/O. R package version 1.2.4. 2025. https://gesistsa.github.io/rio/.

[4] C. Boettiger. knitcitations: Citations for Knitr Markdown Files. R package version 1.0.12. 2021. https://github.com/cboettig/knitcitations.

[5] C. Chan, T. J. Leeper, J. Becker, et al. rio: A Swiss-army knife for data file I/O. 2023. https://cran.r-project.org/package=rio.

[6] J. Cheng, C. Sievert, B. Schloerke, et al. htmltools: Tools for HTML. R package version 0.5.8.1. 2024. https://github.com/rstudio/htmltools.

[7] R. Francois and D. Hernangómez. bibtex: Bibtex Parser. R package version 0.5.1. 2023. https://github.com/ropensci/bibtex.

[8] G. Grolemund and H. Wickham. “Dates and Times Made Easy with lubridate”. In: Journal of Statistical Software 40.3 (2011), pp. 1-25. https://www.jstatsoft.org/v40/i03/.

[9] K. Müller and H. Wickham. tibble: Simple Data Frames. R package version 3.2.1. 2023. https://tibble.tidyverse.org/.

[10] Y. Qiu. prettydoc: Creating Pretty Documents from R Markdown. R package version 0.4.1. 2021. https://github.com/yixuan/prettydoc.

[11] R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. Vienna, Austria, 2024. https://www.R-project.org/.

[12] K. Ren and K. Russell. formattable: Create Formattable Data Structures. R package version 0.2.1. 2021. https://renkun-ken.github.io/formattable/.

[13] V. Spinu, G. Grolemund, and H. Wickham. lubridate: Make Dealing with Dates a Little Easier. R package version 1.9.3. 2023. https://lubridate.tidyverse.org.

[14] H. Wickham. forcats: Tools for Working with Categorical Variables (Factors). R package version 1.0.0. 2023. https://forcats.tidyverse.org/.

[15] H. Wickham. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2016. ISBN: 978-3-319-24277-4. https://ggplot2.tidyverse.org.

[16] H. Wickham. stringr: Simple, Consistent Wrappers for Common String Operations. R package version 1.5.1. 2023. https://stringr.tidyverse.org.

[17] H. Wickham. tidyverse: Easily Install and Load the Tidyverse. R package version 2.0.0. 2023. https://tidyverse.tidyverse.org.

[18] H. Wickham, M. Averick, J. Bryan, et al. “Welcome to the tidyverse”. In: Journal of Open Source Software 4.43 (2019), p. 1686. DOI: 10.21105/joss.01686.

[19] H. Wickham, J. Bryan, M. Barrett, et al. usethis: Automate Package and Project Setup. R package version 3.1.0. 2024. https://usethis.r-lib.org.

[20] H. Wickham, W. Chang, L. Henry, et al. ggplot2: Create Elegant Data Visualisations Using the Grammar of Graphics. R package version 3.5.2. 2025. https://ggplot2.tidyverse.org.

[21] H. Wickham, R. François, L. Henry, et al. dplyr: A Grammar of Data Manipulation. R package version 1.1.4. 2023. https://dplyr.tidyverse.org.

[22] H. Wickham and L. Henry. purrr: Functional Programming Tools. R package version 1.0.2. 2023. https://purrr.tidyverse.org/.

[23] H. Wickham, J. Hester, and J. Bryan. readr: Read Rectangular Text Data. R package version 2.1.5. 2024. https://readr.tidyverse.org.

[24] H. Wickham, J. Hester, W. Chang, et al. devtools: Tools to Make Developing R Packages Easier. R package version 2.4.5. 2022. https://devtools.r-lib.org/.

[25] H. Wickham, D. Vaughan, and M. Girlich. tidyr: Tidy Messy Data. R package version 1.3.1. 2024. https://tidyr.tidyverse.org.

[26] Y. Xie. bookdown: Authoring Books and Technical Documents with R Markdown. Boca Raton, Florida: Chapman and Hall/CRC, 2016. ISBN: 978-1138700109. https://bookdown.org/yihui/bookdown.

[27] Y. Xie. bookdown: Authoring Books and Technical Documents with R Markdown. R package version 0.44. 2025. https://github.com/rstudio/bookdown.

[28] Y. Xie. Dynamic Documents with R and knitr. 2nd. ISBN 978-1498716963. Boca Raton, Florida: Chapman and Hall/CRC, 2015. https://yihui.org/knitr/.

[29] Y. Xie. “knitr: A Comprehensive Tool for Reproducible Research in R”. In: Implementing Reproducible Computational Research. Ed. by V. Stodden, F. Leisch and R. D. Peng. ISBN 978-1466561595. Chapman and Hall/CRC, 2014.

[30] Y. Xie. knitr: A General-Purpose Package for Dynamic Report Generation in R. R package version 1.50. 2025. https://yihui.org/knitr/.

[31] Y. Xie, J. Allaire, and G. Grolemund. R Markdown: The Definitive Guide. Boca Raton, Florida: Chapman and Hall/CRC, 2018. ISBN: 9781138359338. https://bookdown.org/yihui/rmarkdown.

[32] Y. Xie, C. Dervieux, and E. Riederer. R Markdown Cookbook. Boca Raton, Florida: Chapman and Hall/CRC, 2020. ISBN: 9780367563837. https://bookdown.org/yihui/rmarkdown-cookbook.

[33] H. Zhu. kableExtra: Construct Complex Table with kable and Pipe Syntax. R package version 1.4.0. 2024. http://haozhu233.github.io/kableExtra/.

B Appendix 2

Version information about R, the operating system (OS) and attached or R loaded packages. This appendix was generated using sessionInfo().

## R version 4.4.1 (2024-06-14 ucrt)
## Platform: x86_64-w64-mingw32/x64
## Running under: Windows 11 x64 (build 26200)
## 
## Matrix products: default
## 
## 
## locale:
## [1] LC_COLLATE=English_United States.utf8 
## [2] LC_CTYPE=English_United States.utf8   
## [3] LC_MONETARY=English_United States.utf8
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.utf8    
## 
## time zone: America/New_York
## tzcode source: internal
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] rio_1.2.4            lubridate_1.9.3      forcats_1.0.0       
##  [4] stringr_1.5.1        purrr_1.0.2          readr_2.1.5         
##  [7] tidyr_1.3.1          tibble_3.2.1         ggplot2_3.5.2       
## [10] tidyverse_2.0.0      devtools_2.4.5       usethis_3.1.0       
## [13] bibtex_0.5.1         knitcitations_1.0.12 htmltools_0.5.8.1   
## [16] prettydoc_0.4.1      magrittr_2.0.3       dplyr_1.1.4         
## [19] kableExtra_1.4.0     formattable_0.2.1    bookdown_0.44       
## [22] rmarkdown_2.29       knitr_1.50          
## 
## loaded via a namespace (and not attached):
##  [1] gtable_0.3.6      xfun_0.53         bslib_0.8.0       remotes_2.5.0    
##  [5] htmlwidgets_1.6.4 tzdb_0.4.0        vctrs_0.6.5       tools_4.4.1      
##  [9] generics_0.1.3    fansi_1.0.6       RefManageR_1.4.0  pkgconfig_2.0.3  
## [13] lifecycle_1.0.4   compiler_4.4.1    textshaping_0.4.0 munsell_0.5.1    
## [17] codetools_0.2-20  httpuv_1.6.15     sass_0.4.9        yaml_2.3.10      
## [21] urlchecker_1.0.1  pillar_1.9.0      later_1.4.1       jquerylib_0.1.4  
## [25] ellipsis_0.3.2    cachem_1.1.0      sessioninfo_1.2.3 mime_0.12        
## [29] tidyselect_1.2.1  digest_0.6.37     stringi_1.8.4     grid_4.4.1       
## [33] fastmap_1.2.0     colorspace_2.1-1  cli_3.6.3         pkgbuild_1.4.7   
## [37] utf8_1.2.4        withr_3.0.2       scales_1.3.0      promises_1.3.2   
## [41] backports_1.5.0   timechange_0.3.0  httr_1.4.7        hms_1.1.3        
## [45] memoise_2.0.1     shiny_1.10.0      evaluate_1.0.3    miniUI_0.1.1.1   
## [49] viridisLite_0.4.2 profvis_0.4.0     rlang_1.1.4       Rcpp_1.0.13      
## [53] xtable_1.8-4      glue_1.7.0        xml2_1.3.6        pkgload_1.4.0    
## [57] svglite_2.2.1     rstudioapi_0.16.0 jsonlite_1.8.8    R6_2.5.1         
## [61] plyr_1.8.9        systemfonts_1.3.1 fs_1.6.4